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Abstract 

We study how the spread of computer viruses, worms, and other 
self-replicating malware is affected by the logical topology of the net- 
work over which they propagate. We consider a model in which each 
host can be in one of 3 possible states - susceptible, infected or removed 
(cured, and no longer susceptible to infection). We characterise how 
the size of the population that eventually becomes infected depends on 
the network topology. Specifically, we show that if the ratio of cure to 
infection rates is larger than the spectral radius of the graph, and the 
initial infected population is small, then the final infected population 
is also small in a sense that can be made precise. Conversely, if this 
ratio is smaller than the spectral radius, then we show in some graph 
models of practical interest (including power law random graphs) that 
the final infected population is large. These results yield insights into 
what the critical parameters are in determining virus spread in net- 
works. 



1 Introduction 

Computer viruses and worms are self-replicating pieces of code that prop- 
agate in a network. The essential difference between them is that a virus 
typically needs some form of human intervention, such as opening an email at- 
tachment or executing some software, to cause them to be replicated, whereas 
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worms do not require such intervention. They use a number of different meth- 
ods to identify new targets for infection; for example, many worms scan ran- 
domly generated IP addresses to locate vulnerable hosts, while email viruses 
send copies of themselves to all addresses in the address book of the victim. 
A survey of techniques for target location can be found in |22j . 

The particular mechanism chosen by a worm or virus to propagate itself 
defines a topology over which the infection can potentially spread. What 
impact does the topology have on the speed of spread of the epidemic, and 
moreover what are the key features of the topology that determine how viru- 
lent the epidemic is? These are the questions that we address in this paper. 

In this paper, we use a susceptible-infected-removed (SIR) model to de- 
scribe the spread of the epidemic. Here, each susceptible node can be infected 
by its infected neighbours at a rate proportional to their number, and remains 
infected for a deterministic or random time until it is removed. While it is 
infected, it has the potential to infect its neighbours. Removal can corre- 
spond to either (i) patching the computer represented by the node, or (ii) its 
disconnection from the network by some quarantining mechanism, or (iii) the 
exhaustion of its infectious period either by a time-out mechanism or because 
it has tried all its neighbours. Once a node is removed, it cannot become sus- 
ceptible or infected again. Our model ignores the possibility that susceptible 
nodes can also be removed, e.g., because they have received a patch or virus 
signature conferring immunity. This is justified if the timescale for patching 
of susceptible hosts is much larger (happens much more slowly) than that of 
epidemic spread. 

In the context of worms, there has recently been considerable interest in 
automatic mechanisms for detecting whether hosts are infected, and throt- 
tling or quarantining them; see, e.g., j2H|. There has also been work on auto- 
matic generation of self-certifying alerts jU] which are equivalent to patches. 
Thus, it is possible to view removal as happening on the same time scale as 
infection. In the case of viruses, it takes longer to generate virus signatures 
and update antivirus software, but their spread is also slower. Hence, again, 
the model is not unrealistic. 

There is a substantial literature on the SIR model in epidemiology, start- 
ing with the work of Kermack and McKendrick |17j . A commonly used 
approach in early work was to approximate a stochastic model by a deter- 
ministic one in a large population (law of large numbers) limit. More recent 
work has considered stochastic aspects, such as obtaining Poisson or normal 
limiting distributions for the number of survivors; see, for example, [31 ITS]. 
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A key concept in these studies is the basic reproductive number Rq, which 
denotes the expected number of secondary infectives caused by a single pri- 
mary infective. If Ro > 1, the infection spreads to some sizeable fraction of 
the entire population; if Rq < 1, then the fraction eventually infected is close 
to zero. The concept of basic reproductive number is easy to define with 
uniform mixing (i.e., when any infective can infect any susceptible equally 
easily) but it is not clear how to apply it to general networks, where this 
number could be different for every node. One approach is to consider net- 
works with special structure, where either nodes or links belong to one of 
a small number of types. This is the approach taken, for example, by Ball 
et al. |B], who consider two-level models where network links can belong to 
one of two types - (i) local, e.g., within a household or (ii) global, between 
households. 

In this paper, we obtain conditions for the number eventually infected 
to be small, in arbitrary networks. Conversely, we obtain conditions for 
the number of infected nodes to be large in some specific network models 
of practical interest, including Erdos-Renyi and power law random graphs. 
The rest of the paper is structured as follows. We introduce the epidemic 
spreading model in Section EJ Sufficient conditions for small epidemic size 
(where the size is defined as the number that ever become infected) are 
obtained in Section |21 Applications of these results to the star, clique, Erdos- 
Renyi graph, and power law graph are found in Section OJ Furthermore in 
this Section we took advantage of results on the giant component for various 
families of graphs to give a lower bound to the number of nodes ultimately 
removed. Section 0] summarizes the paper and describes further directions 
to pursue. 

2 Model 

We consider a closed population of n individuals, connected by a neigh- 
bourhood structure which is represented by an undirected, labelled graph 
9 = (V, E) with node set V and edge set E. Each node can be in one of 
three possibly states, susceptible (S), infective (I) or removed (R). The initial 
set of infectives at time is assumed to be non-empty, and all other nodes 
are assumed to be susceptible at time 0. The evolution of the epidemic is de- 
scribed by the following discrete-time model. Let X v (t) denote the indicator 
that node v is infected at the beginning of time slot t and Y v {t) the indicator 
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that it is removed. Each node that is infected at the beginnning of a time slot 
attempts to infect each of its neighbours; each infection attempt is successful 
with probability [3 independent of other infection attempts. Each infected 
node is removed at the end of the time slot. Thus, the probability that a 
susceptible node u becomes infected at the end of time slot t is given by 
1 — rL~«(l ~ @X v (t)), where we write v ~ u to mean that (u, v) G E. Note 
that the evolution stops when there are no more infectives in the population. 
At this time, we want to know how many nodes are removed. 

The above model is known as the Reed-Frost model. It corresponds to 
a deterministic infectious period which is the same at every node. It is one 
of the earliest stochastic SIR models to be studied in depth, because of its 
analytical tractability. Note that the evolution can be described by a Markov 
chain in this case. Another commonly used model assumes that infectious 
periods are iid and exponentially distributed, so that the system evolves as 
a continuous time Markov process. General infectious periods give rise to 
non-Markovian systems. These are outside the scope of this work. 

The object of interest is the number of nodes that eventually become 
infected (and removed) compared to the number initially infected. As noted 
earlier, in mean field models of SIR epidemics, the number of nodes removed 
exhibits a sharp threshold; as (5 is increased, it suddenly jumps from a con- 
stant (which doesn't depend on n) to a non-zero fraction of n, the number 
of nodes in the system. We wish to ask if a similar threshold is exhibited on 
general graphs and, if so, how the critical value of (5 is related to properties 
of the graph. 

We now state general conditions for the number of nodes removed to be 
small. Let A denote the adjacency matrix of the undirected graph G, i.e., 
a,ij = 1 if G E and = otherwise. Since A is a symmetric, non- 
negative matrix, all its eigenvalues are real, the eigenvalue with the largest 
absolute value is positive and its associated eigenvector has non-negative 
entries (by the Perron- Frobenius theorem). If the graph is connected, as we 
shall assume, then this eigenvalue has multiplicity one, and the corresponding 
eigenvector is the only one with all entries being non-negative. 

Theorem 1. Suppose f3\\ < 1. Then, the total number of nodes removed, 
|F(oo)| ; satisfies 

E[m°°)D < TZ^V^M, 
where |A(0)| is the number of initial infectives. Morevoer, if the graph G is 
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regular (i.e., each node has the same number of neighbours), then 

n\Y(oc)\\ < Yz^imi 

Proof. In order for an arbitrary node v to be infected at the start of time 
slot t, there must be a chain of distinct nodes u , Ui, . . . , u t = v along which 
the infection passes from some initial infective uq to v. Thus, by the union 
bound, 

nx v (t) = i)< Yl /?%o(o)> 

»0r")«t-l 

where the sum is taken over nodes uo, . . . , Ut-i such that (ui-i,u%) G E for 
all i — 1, . . . , t, where we take u t = v. Note that we have not imposed the 
requirement that the be distinct as we are only seeking an upper bound. 
Consequently, the probability that node v ever gets infected (and hence that 
Y v (oo) = 1) is bounded above by 

oo 

P(^(oo) = l)<^5>A)LX u (0), 

since the uv th entry of the matrix A 1 is simply the number of paths of length 
t between nodes u and v. It is immediate from the above that 

oo 

n\Y(oo)\] = X>Cn(oo) = 1) < l T (PAyX(0), 

vev t=o 

where 1 denotes the vector of ones. Now, if (3Xi < 1, then we can rewrite 
the above as 

E[|F(oo)|] < 1 T (J - /3A)- X X(0) 

< ||1|| ||(J ||X(0)||, (1) 

where || • || denotes the Euclidean norm in the case of a vector, and the 
matrix or operator norm in the case of a matrix. Now the operator norm 
of a symmetric matrix is its spectral radius, the largest of its eigenvalues in 
a bsolute value. He nce - (3A)- l \\ = (1 - pX^' 1 . Moreover, ||X(0)|| = 
VE^ 1 ^ ) = V\ X (°)\- Likewise, ||1|| = s/ri. Substituting these in (JTJ) 
yields 

E[|y(oo)|] < - y^\xWl 
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which is the first claim of the theorem. 

Next, note that by using the spectral decomposition 



{I-PA)- 1 = J2 l \. x,.r 



T 



where Xj denotes the eigenvector corresponding to the eigenvalue Aj of A, 
and xj its transpose, we can rewrite ((TJ) as 

n 

E[|Y(oo)|] < T^3\ lTx ^ X{0) - (2) 



i=l 



Now, if G is a regular graph and each node has degree d (i.e., has exactly 
d neighbours), then each row sum of its adjacency matrix A is equal to 
d. Hence, it is clear that the positive vector ^=1 is an eigenvector of A 
corresponding to the eigenvalue d. By the Perron-Frobenius theorem, this 
is therefore the largest eigenvalue. Hence, Ai = d, x\ = -7=1, and all other 
eigenvectors orthogonal to 1. Hence, by (J2J), 

E[|Y(oc)|] < —L^-1 t XiX t X (0) 

1 -1 T 11 T X(0) = — l — \X 



n{\ -/3Ai) w 1 -/SAi 

This is the second claim of the theorem. □ 

Actually, there is an easier proof. Let v(i) = P(node i is ever infected). 
Then, — 1 if % E /, where / denotes the set of initial infectives, and 
otherwise v(i) < Ylj^i P^U), where we write j ~ i to mean that (i, j) is an 
edge. Thus, 

(I-PA)v<l It (3) 

where 1/ denotes the vector with components 1 for i E I and for i ^ /, 
and the inequality holds in the usual partial order, namely componentwise. 
Now, if j3Xi(A) < 1, then we have the power series expansion 

00 

(I-f3A)- 1 = Y,P k A k , 

k=0 
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from which it is immediate that (I — fiA)' 1 is a non-negative matrix. There- 
fore, we can multiply both sides of the inequality in (jSJ) by (I — (3 A)" 1 to 
obtain 

v = E[Y(oo)] < (I — PA)- 1 X{<S), 

and so 

E[|F(oo)|] < 1{I - i3A)- 1 X{0). 

This is the same as (0), and the proof carries on the same way from there. 
Remarks The upper bound in the first claim of the theorem is close to the 
best possible in general, as the example of the star-shaped network in Section 
13.11 demonstrates. 

The theorem says that, if /3Ai < 1, then starting from a 'small' population 
of initial infectives, the final size of the epidemic is small. For example, if 
|X(0)| = 1, then the final size of the epidemic is bounded by a constant in 
the case of regular graphs, and by a multiple of ^fn in general. Thus, the 
fraction of nodes infected goes to zero as n tends to infinity. 

Note that the proof of the theorem above doesn't require us to assume 
that the epidemic be of Reed-Frost type. It works for general infectious pe- 
riods J since we are only using expectations throughout, which don't require 
independence assumptions. Therefore, following the steps of the above proof 
and replacing (3 by the probability that a node gets infected bu an infected 
neighbour. 

In turn, if node u is infected, it will infect j, if they are connected and 
if the time it takes to contact this node given by an exponential random 
variable with parameter A is less than J. 

Theorem 2. Suppose that J is such that E[e~ AJ ] < oo and let 

pj = l- E[e- AJ ] . 
IfPj^i < 1 then the total number of nodes removed, |Y(oo)|, satisfies 

n\Y(oo)\] < i _ 1 VnJxWl 

where \X(0)\ is the number of initial infectives. Morevover, if the graph G is 
regular (i.e., each node has the same number of neighbours), then 

E[|Y(oo)|] < T3^I X (°)I- 
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The converse is not true in general. Consider the ring on n nodes. As a 
regular graph with degree 2, its adjacency matrix has maximum eigenvalue 
2. Nevertheless, for any j3 < 1, the size of the epidemic starting from a 
single initial infective is bounded by the sum of the sizes of two independent 
branching processes, where each branching process has Bernoulli offspring 
distribution with parameter j3. (The branching processes decide if the epi- 
demic will spread one node left or right from its current position before dying 
out). Since each branching process is subcritical, its final size is finite almost 
surely and in expectation. Thus, the expected size of the epidemic is a con- 
stant that does not depend on n. In other words, the epidemic is small even 
if p\ 1 =2p>l. 

In particular, the SIR epidemic on the ring does not exhibit a sharp 
threshold on the open interval (0, 1). On this interval, the final size of the 
epidemic is a smooth function of the infectiousness parameter (3, even in the 
limit as n tends to infinity. It is shown in Section 13.11 that a similar result 
holds for star-shaped networks as well; in fact, there is no threshold even on 
the closed interval [0, 1] in this case. 

However, while there isn't always a threshold, it turns out that there is 
one in many networks of practical interest: there is a lower bound on j3, above 
which the epidemic infects a positive fraction of the population on average. 
We now illustrate this through several examples. 

3 Examples 

3.1 Star-shaped networks 

The star-shaped network is of interest because it illustrates that the bound in 
Theorem |21 is close to the best possible for general networks. It also exhibits 
a smooth dependence of the final size of the epidemic on the infectiousness 
parameter f3, thereby demonstrating that threshold behaviour doesn't always 
occur. Finally, understanding the star is important to understanding certain 
power-law networks. 

Consider the star network, consisting of a hub and n — 1 leaves, each of 
which is attached only to the hub. Its adjacency matrix A has ones along the 
first row and column, except for the (1, 1) element, which is zero; all other 
elements are zero. In other words, A = 11 T — exej, where ef = (10. . .0). 
Thus A is a rank-two matrix and can have only two non-zero eigenvalues. It is 
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readily verified that (\/n — 11 . . . 1) T and (— yn — 11 . . . 1) T are eigenvectors 
corresponding to the eigenvalues \Jn — 1 and — \Jn — 1 respectively, and so 
the spectral radius of A is \Jn — 1. 

Now suppose /?A/n — 1 = c < 1. Consider the initial condition where only 
the hub is infected, so that |X(0)| = 1. The number of leaves infected before 
the hub is cured is binomial with parameters n — 1 and j3. No other leaves 
can be infected subsequently Hence, 

E[|F(oo)|] = 1 + P(n - 1) = 1 + cVn^l, 

which is comparable to the upper bound, \Jn — 1/ (1 — c), given by Theorem 
HI We also observe in this case that E[|K(oo)|] is a smooth (almost linear) 
function of f3 and does not exhibit any threshold behaviour. 

Suppose next that the hub is initially uninfected but k leaves are infected. 
The hub becomes infected in the next time step with probability 1 — (1 — f3) h . 
It subsequently infects a number of leaves which is binomial with parameters 
n — 1 — k and (3. The epidemic dies out at t — 3. So, in this case, 

E[|y(oo)|] = k + [1 - {1 - (3) k ][l + P{n - 1 - k)] 

< k + (3k[l +p(n-l- k)] < \X(Q)\(1 + 2c 2 ). 

Thus, when the hub is initially uninfected, the expected final size of the 
epidemic is only a constant multiple of the initial number of infectives. This 
illustrates that the initial condition can have a big impact in general. 

3.2 Complete graph 

A complete graph is one which an edge is present between every pair of 
nodes. Much of the early work on SIR epidemics was based on mean field 
models. These are rigorously justifiable only in the case of complete graphs, 
and motivates our interest in them. We shall recover the classical result that 
the epidemic has a threshold at Rq = 1, where the basic reproduction number 
Ro = (3{n — 1) is defined as the mean number of secondary infections caused 
by a single primary infective, when the entire population is susceptible. From 
the perspective of networking applications, the BGP routers belonging to the 
top level autonomous systems of the Internet form a completely connected 
component. In addition, large ISPs often organize their internal BGP (iBGP) 
routers into a set of route reflectors that are completely connected. 
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The complete graph is a regular graph with common node degree n — 1. 
Therefore, its spectral radius is Ai = n — 1, and we have by Theorem |2] that, if 
/3(n— 1) < 1, then the final size of the epidemic is bounded by 1/(1 — (3{n— 1)) 
times the initial number of infectives. We now establish a converse. 

Suppose (3{n — 1) = c > 1 is held constant. (We don't need to assume 
this, but the results will need to be restated in terms of the limits superior 
and inferior of the sequence c„; it should be clear to the reader how to do so 
based on the discussion below.) Let \Xq\ = 1 and let u be the initial infected 
node. Consider the random subgraph of the complete graph obtained by 
retaining each edge with probability /3, independent of all other edges, and 
let C u denote the connected component containing u in this random graph 
(possibly just the singleton {«}). It is clear that C u can be interpreted as 
the set of infected nodes in the epidemic; each neighbour of u is infected with 
probability /3, and is hence a neighbour of u in the random graph described 
above, and so on iteratively. Thus, the number of infected nodes in the 
epidemic has the same probability law as the size of the component C u . 

The above random graph model was introduced by Erdos and Renyi [T3] ; 
we denote it by $(n, where n denotes the number of nodes, and (3 the prob- 
ability that the edge between each pair of nodes is present. It is also called 
a Bernoulli random graph because the indicators of edges are iid Bernoulli 
random variables. 

We now use the following fact, which was established by Erdos and Renyi 
[T3| ; see fUJ Theorem 5.4], for instance, for a more recent reference. Here, 
we assume that c = f3(n — 1) is held constant while n — > oo, and that c > 1. 

Theorem 3. Let 7 be the unique positive solution of 7 + e _7C = 1. Then, 
as n — ► oo ; the size of the largest connected component in the random graph 
S(n, /3) is (1 + o(l))7n ; with probability going to 1 as n tends to infinity. 

The uniqueness of 7 follows from the convexity of the function f(x) = 
x + e~ cx and the fact that /(0) = 1, while its existence follows from the 
continuity of / and the fact that /'(0) = 1 — c < 0, but that f(x) — > 00 as 
x — > 00. 

We now estimate the size of C u , the connected component containing the 
initial infective. If u belongs to the 'giant component', then \C U \ = 7/2. Since 
a fraction 7 of nodes belong to the giant component, the probability that 
node u does so is 7. Hence, E[|(7 U |] = (1 + o(l))7 2 n. We have thus shown 
the following: 
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Lemma 4. Let G = (V, E) be the complete graph on n nodes, and let (3 = -^-r 
for an arbitrary constant c > 1 . Then, the final size of the epidemic satisfies 

E[|y(oo)|] > (l + o(l)) 7 2 n, 

for any \X(0)\ > I, where 7 > solves 7 + e~ 7C = 1. 

There is thus a threshold at c = 1 for the final size of the epidemic; 
starting with a constant number of initial infectives, the final size is a constant 
independent of n if c < 1, and a fraction of n if c > 1. 

3.3 Erdos-Renyi random graphs 

The Erdos-Renyi graph S(n,p) with parameters n and p is defined as a 
random graph on n nodes, where the edge between each pair of nodes is 
present with probability p, independent of all other edges. If p = 1, then this 
is the complete graph. 

The spreading behavior of an epidemic on an Erdos-Renyi graph is of 
interest for a number of reasons. First, it is a graph that has received consid- 
erable attention in the past [4]. Second, it is an important component of the 
class of power law random graphs that model the Internet AS graph. Thus if 
we are to understand the robustness of the Internet AS-level graph, we need 
to characterize the robustness of the Erdos-Renyi graph. 

We shall consider a sequence of such graphs indexed by n. Denote by 
d the corresponding average degree, i.e. d = (n — l)p. Note that p and d 
depend on n, but this is suppressed in the notation. We say that a property 
holds with high probability if its probability goes to 1 as n — > 00. Define 
c n = [3d = {n — l)/3p; we have suppressed the dependence of (3 and p on 
n in the notation, but make it explicit in the case of c. Consider an SIR 
epidemic on such a graph starting with one node initially infected. We have 
the following: 

Lemma 5. // lim sup^^ c n < c < 1, then for all n sufficiently large, 
E[|y(oo)|] is bounded by a constant that does not depend on n. On the other 
hand, if lim inf n ^oo c n > c > 1, then E[|F(oo)|] > (1 + o(i))7 2 n where 7 > 
solves 7 + e~ 7C = 1 . 

Proof. Suppose first that liminf^oo c n > c > 1. As in the case of the com- 
plete graph, we identify the infected individuals in the epidemic with the 
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connected component containing the initial infective u in an Erdos-Renyi 
random graph with parameters n and (3p. (If edge (u, v) is present in the 
original Erdos-Renyi graph, which happens with probability p, then u suc- 
ceeds in infecting v with probability p. This yields the new graph with edge 
probability (3p; the independence of the edges is obvious.) Thus, the second 
claim of the lemma follows in the same way as Lemma 0] 

The first claim is stronger than what the upper bound of Theorem|21yields. 
Note that, by the Perron- Frobenius theorem, the spectral radius Xi{A) of 
the adjacency matrix lies between the smallest and largest node degree. For 
the random graph S(n,p), the node degrees are binomial random variables 
with parameters n — 1 and p. If the average node degree d — (n — l)p 
satisfes d 3> log(n), i.e., \og{n)/d — > as n — > oo, then it can be shown 
using Chernoff 's bound that both the minimal and maximal node degree are 
(l+o(l))d with high probability; hence, so is the spectral radius. In this case, 
Theorem |21 yields that, if p\i(A) ~ (n — l)/3p < c < 1, then the expected 
final size of the epidemic is bounded by a constant times \fn. To show that 
it is in fact bounded by a constant, and that this holds even without the 
assumption that d ^> log(n), we use a branching process construction. 

Rather than fixing the random graph S(n,p) in advance, we use the prin- 
ciple of deferred decisions to generate it dynamically as the epidemic pro- 
gresses. Thus, starting with the initial infective u, we put down all edges 
from it to other nodes. Then, we decide whether u succeeds in infecting its 
neighbours along each of those edges. For each neighbour v so infected, we 
repeat the process. Thus, the number of nodes infected by u is binomial with 
parameters n — 1 and ftp; the number of nodes infected by each subsequent 
infective is stochastically dominated by such a binomial random variable. 
Thus, the size of the epidemic is bounded above by the size of a branching 
process whose offspring distribution is binomial, B(n — l,(3p). The branch- 
ing process is subcritical by the assumption that (n — l)(3p < c < 1, and so 
it becomes extinct with probability 1, i.e., its final population size is finite 
almost surely, and in expectation. It can be shown directly, using generat- 
ing functions, that it is bounded uniformly in n. Alternatively, note that if 
(n — l)/3p = c for all n, then the binomial offspring distributions converge in 
distribution to a Poisson with parameter c as n —>■ oo; the population sizes 
of the corresponding branching processes also converge, both in distribution 
and in expectation. Since c < 1, the branching process with Poisson(c) off- 
spring distribution is subcritical, and so it has a finite mean population size. 
This completes the proof of the lemma. □ 
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3.4 Power law random graphs 

There has been considerable interest in power law graphs since it was first 
noticed that the Internet AS-level graph exhibits a power law degree distri- 
bution, |14j . Briefly a power law graph is one where the number of nodes 
with degree k is proportional to k~"< for some 7 > 1. For the mean degree to 
be finite, we need 7 > 2 and this is the range we shall consider. The Internet 
AS-level graph is characterized by 7 ~ 2.1. 

There have been several different models proposed for graphs with power 
law degree distributions; see, for example, [21 El- In this paper, we consider 
the following model of random graphs on n vertices, introduced in ^T]. Let 
w = (wi, w 2 , • • • , w n ) be a sequence of positive weights assigned to the nodes 
of the graph; we assume without loss of generality that wi > w 2 > • • • > w n . 
The edge between the pair of vertices is present with probability 



WiWj 
n ; 
k=l W k 



independent of all other edges; we assume that w\ < J2k=i w k- The resulting 
random graph is denoted G(w). For example, taking Wi = np for all i yields 
the Erdos-Renyi model with parameters n and p. 

It is easy to see that lOj is the expected value of the degree of node i; 
hence, this model is referred to as the expected degree model. 1 We do not 
assume that the Wi are integer-valued. Note that the resulting graph may 
have self-loops but it does not have multiple edges. The self-loops do not 
affect the spread of the epidemic and are not important to our analysis. 

Let d denote the average and m the maximum expected degree. (Thus 
m — Wi but it is convenient to distinguish it in the notation as the model is 
parametrised by d, m and the exponent of the power law degree distribution.) 
Chung and Lu [TT] propose the following explicit power law model for the 
expected degree sequence: 



1 



Wi = c(i + i) i- l ,l<i<n, (4) 

where 

C= -(/// //I— -1 

7 — 1 \m(7 — 1)/ 

1 Reed and Molloy HUJOn] have studied a model where one conditions on actual rather 
than expected degrees. The expected degree model has the advantage that edges are 
independent, which makes it much easier to analyse. 



7-2, -JL- . /d( 7 -2)\7- 1 , c x 
dm- 1 , iq = n — y -) . (5) 
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The number of nodes with weight bigger than k (equal to the largest % such 
that Wi > k) scales like fc 1-7 . Thus, 7 is the exponent of the power law 
distribution of expected node degrees. The weights Wi are the order statis- 
tics of this distribution. The distribution is shifted by io and scaled by c, 
where these constants are chosen so as to achieve the specified average d and 
maximum m for the expected degrees. 

The eigenvalues of the adjacency matrix for this random graph model 
have been studied by Chung, Lu and Vu. They show ^21 Theorem 4] that, 
with high probability, the spectral radius of the graph is 



By Theorem El if (3p(A) < 1, then the size of the epidemic is bounded by 
yjn times the size of the initial infective population. 

We now establish a partial converse. We show that the graph has a core 
such that, if f3p(A) > 1 and one of the initial infected nodes is in the core, 
then the expected size of the epidemic is large. This is analogous to the 
situation in the star, where there is a large epidemic when f3p(A) > 1 if the 
hub is initially infected. For general power law graphs, we do not know what 
happens when the initial infectives aren't in the core. 

It is easy to see from the description above that the expected degree of 
node i is precisely Wi, and the expected average degree of the graph is given 
by d — ^ J2i=i w i- We can now write pij = 

It is straightforward to describe the evolution of a Reed- Frost epidemic on 
the expected degree random graph model. Consider a single initial infective, 
say node %. Node j becomes infected at time 1 if edge (i, j) is present in the 
random graph and if i infects j in the first time slot; this has probability /3pij, 
and is independent of whether node i infects some other node k. Moreover, 
node i cannot infect node j in any subsequent time step since it is removed 
at time 1. Using the principle of deferred decisions, we can construct a 
realisation of the random graph as the epidemic spreads. It is clear from this 
construction that the set of nodes that eventually become infected can be 
identified with the connected components containing the initial infectives in 
the random graph with weight sequence /3w, namely G(/3w). Suppose there 
is a single initial infective. The question of whether there is a large epidemic 
is equivalent to that of whether the random graph G(w) possesses a giant 
component, and whether the initial infective belongs to this giant component. 




7 >2.5, 

2 < 7 < 2.5. 
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If there is more than one initial infective, the final set of removed nodes is 
the union of the connected components containing the initial infectives, in 
the random graph G(w). 

A sufficient condition for the existence of a giant component is derived 
in ^21 Theorem 3]. The condition can be stated in terms of the average 
expected degree d, as follows: 

Theorem 6. For a random graph G(w) with expected degree sequence having 
average expected degree d > 1 + 5 > 1, there is a unique giant component C 
such that J2iec Wi — ~~ c<5 ) ^2iev Wi > w h ere c s £ (0, 1) is a constant that 
depends only on 5. 

In words, the giant component contains a non-zero fraction of the total 
weight of all nodes. Later, we will show that this implies that it contains a 
non-zero fraction of the nodes. 

We use this result to obtain estimates on the final size of an epidemic 
on a power law random graph. Fix k (as a function of n) and consider the 
subgraph induced by the k nodes with the largest weight in the random graph 
G(/3w). The average expected degree of this subgraph is easily seen to be 



If this is strictly larger than 1, then by Theorem |H1 above, this subgraph has 
a giant component. We now find conditions on k such that d\. > 1. 
We have from (@J) and (0) that 




(6) 




and so 





1 



k 



7-2 



Now, if k ^> io, it follows that 
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Substituting this in JBJ) yields 

d k = (3dn^ . (8) 



3—7 7—3 



We now distinguish two cases. Suppose first that 7 > 3. Then d k is a 
non-decreasing function of k, and its maximum value, attained at k — n, is 
(3d. This only yields the weak result that there is a large epidemic if (3d > 1. 

Suppose next that 2 < 7 < 3. Then d k is a decreasing function of k. Fix 
5 > 0. Defining iV^ to be the largest value of k for which d k > 1 + 5, we see 
that 

/3d 



iV 5 



n 



+ 1, (9) 



,1 + 6, 

where |_^J denotes the integer part of x. The following result is now an easy 
consequence. 

Lemma 7. Let (3 > be arbitrarily small. Then the expected size of the 
epidemic, starting from an arbitrary initial infective, is bounded below by a 
constant multiple of n, where the constant may be depend on (3. Here the 
expectation is taken both with respect to the random realisation of a graph, 
and the evolution of the epidemic conditional on the graph. 

Proof. Consider the N$ nodes of largest weight, where Ns is given by (JHJ). 
Since (3, d > are constants, Ns is a constant multiple of n. By Theorem® 
the random graph G((3w) restricted to these nodes contains a giant compo- 
nent C such that 

Ng 7-2 



i£C i=l 



where we have used (0) and to obtain the last asymptotic equivalence. 
Recall that cs G (0, 1) is a constant that depends on 6. Since (3, d and 
5 are constants, while J2i=i w i = equation (jTOjl tells us that the giant 
component C contains a constant fraction of the total weight of the graph. 
We now deduce that it must also contain a constant fraction of the total 
number of nodes. Indeed, for a given weight, the size (in number of nodes) 
would be minimised if C contained the highest weight nodes. Thus, we ask 
what is the smallest value of k such that £) i=1 w i exceeds the weight of C . 
It follows from (jZJ and (fTTH) that we require 



2=* , /- , / (3d 



1 7-2 
dni^ki- 1 ~ dn(l — c$ 



1 + 6 



7-2 
3-7 
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and so, 

k~n(l-c 5 )^(^)^; (11) 

in particular, k is equivalent to a constant multiple of n. 

Now, if any of the initially infected nodes belongs to the giant component 
C, then C is a subset of the set of nodes ever infected; hence, the final size 
of the epidemic is proportional to n. On the other hand, suppose none of 
the initial infectives belongs to C. Let % be an initially infected node. Now, 
the probability that there is an edge between i and C in the random graph 
G(fiw) is given by 

p ( t ' C ) = nd = ( 1 + °( 1 ))( 1 ~ C5 )(,YT5/ Wu 

which is a positive constant bounded away from zero. Conditional on this 
edge being present, C is a subset of the set of eventually infected nodes. 
Thus, in this case too, the expected final size of the epidemic is proportional 
to n. This completes the proof of the lemma. □ 

Let us summarise our findings: If 7 > 3, then there is a large epidemic 
if the set of initial infectives contains a high-degree node but not otherwise; 
this is analogous to the star network studied earlier, where there is a large 
epidemic if the hub is initially infected, but otherwise the probability of a 
large epidemic is small. In the next section, we introduce another family of 
scale-free networks which will lead to more consistent results. 

3.5 Inhomogeneous VK-graphs 

We are interested in the following family of graphs G(W) with vertex set 
{1, . . . , n} where i and j are connected by an edge with probability in- 
dependently from all other pairs of nodes, and Pij is defined as follows: let 
{Xi, i — 1, . . . ,n} a sequence of iid random variables uniformly distributed 
on [0, 1] than pij = W(Xi,Xj), where W : [0, l] 2 — > [0, 1] is measurable and 
symmetric. This model has been introduced in ^J] and generalised in |3J to a 
larger family of inhomogeneous networks. As a matter of example, W(x) = p 
for all x gives the classical Erdos-Renyi random graph. For more details on 
this family of graphs we refer the reader to [3]. Our main interest in what 
follows is to understand the emergence of the giant component. 
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Let Tw be the integral operator with kernel nW defined by 

Twf = nW(x,y)f(y)dy , 
Jo 

and define 

11^11= sup{||r w /|| 2 : / >0, ||/|| 2 <1}. 

As previously, we check that starting with a G(W) graph, the epidemic graph 
is described by G((3W). 

The next Theorem Corollary 3.2] characterises the emergence of the 
giant component in this family of graphs. 

Theorem 8. Consider the graph G(W) then the threshold for the existence 
of the giant component is given 1. More precisely 

• if II^VH < 1, then the size of the largest component is negligible with 
respect to n almost surely, 

• if\\Tw\ \ > 1 and W is irreducible 2 , then the largest component consists 
of a non zero fraction of n. 

Hence, ||7V|| -1 is the threshold for the final size of the epidemic on a 
G(W), i.e. if j3 < ||Tw|| -1 there is a negligible fraction of removed nodes, 
whereas if j3 > ||7V|| -1 there is a non-zero fraction of removed nodes. We 
are now going to give explicit computations in the case of scale-free networks. 

Let us examine the case where 

J W(u)du 

and such that for X uniformly distributed on [0, 1], the variable W(X) follows 
a Pareto (power-law) distribution, i.e. 

F(W(X) e(t,t + dt)) = (1+ 7 t)1+7 , t e [0, oo) . (12) 

This yields W(x) = (1 — x)^ 1 ^ 1 — 1 and if we assume that 7 > 2, we have 

W(x, y) = ^((1 - x)-^ - 1)((1 - y)- 1 /-' - 1) . 
n 



2 It cannot be split into two disjoint components 
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Let us now go back to the operator T w , our goal is to compute the L 2 -module 
of 



(T w f)(x) = C ^—{{l - x)- 1 ^ - 1)((1 - y)-^< - l)f(y)dy . 
Jo n 

It is not difficult to see that the maximum is reached for f (y) = ^t^— tt^ 

Jo W(u)du 

and thus ^ 

\\Tw\\ = (7-1) / W{yfdy=^—. 

Jo 7 - 2 

Applying the above results for this specific W— graph we see that there will a 

large outbreak if we have a giant component in the corresponding graph that 

is to say ^2 > 1- Hence if f3 > then the epidemic will eventually reach 

a proportion r > of the total population. As a by product of j3 Theorem 

6.2], it turns out that r, is related to the solution of the following functional 

equation 

/ = 1 - e Twf . (13) 

We omit this last computation as it is a bit tedious and does not bring any 
further insight into the model. As previously we can conclude that if (3 > 
then the final size of the epidemic is bounded below by (1 + o(l))r 2 n where 
r is related to the solution of the functional equation (|T3j). 

Finally note that if T is a random variable with probability distribu- 
tion given by (j!2)l then the epidemic threshold corresponds to the ratio 
E(T)/E(T 2 ). Moreover this gives a stronger result than the one derived in 
paragraph 13.41 since < 7 — 1 where is the average degree d. 



4 Conclusion 

Probabilistic methods and tools offer a powerful set of analytical techniques 
to understand the spread of epidemics. Such techniques were used in this 
paper to gain further insight into models, which typically have been inves- 
tigated through mean-field approximations and simulation studies. Let us 
recapitulate our key results and pinpoint some further directions of research. 

We derived a threshold for a small outbreak and showed that it is in- 
deed close to the best possible in general, as it has been be demonstrated for 
the example of the star-shaped network. We then took advantage of results 
characterising the giant component for various families of graphs to give a 
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lower bound to the number of nodes ultimately removed. The Reed-Frost 
model represents a starting point and it would be useful to extend our anal- 
ysis to exponentially distributed and general infectious periods. Finally, we 
intend to pursue this analysis for other epidemic models and other classes of 
topologies. 
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